Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuantao Gu

Stage-wise Distortion-Perception Traversal in Zero-shot Inverse Problems with Diffusion Models

May 28, 2026

Jiawei Zhang, Ziyuan Liu, Leon Yan, Zhenyu Xiao, Yuantao Gu

Abstract:The distortion-perception (D-P) tradeoff is a fundamental phenomenon of Bayesian inverse problems, which characterizes the inherent tension between distortion performance and perceptual quality. Enabling flexible traversal of the D-P tradeoff at inference time is crucial for practical applications. Despite the recent success of diffusion models in zero-shot inverse problem solving, efficient and principled strategies for D-P traversal in diffusion-based inverse algorithms remain inadequately characterized. In this paper, we propose a stage-wise framework for realizing D-P traversal using a single diffusion model in zero-shot inverse problems. Our proposed method, termed MAP-RPS, starts with an MAP estimation stage that approximates the MMSE solution and provides a low-distortion initialization, followed by a re-noised posterior sampling stage that progressively improves perceptual quality. We provide theoretical analyses for both stages, establishing the validity and effectiveness of the proposed design. Furthermore, we extend MAP-RPS to the latent space, yielding LMAP-RPS, which enjoys broader applicability by leveraging large-scale pre-trained latent diffusion backbones. Extensive experiments demonstrate that MAP-RPS and LMAP-RPS enable more effective D-P traversal on various tasks, while also exhibiting strong performance as efficient solvers for real-world inverse problems.

* Accepted by ICML 2026

Via

Access Paper or Ask Questions

Continuity and Ordinality Matter: Constraining Time Series Tokens for Effective Time Series Analysis with Large Language Models

May 22, 2026

Musheng Li, Ziying Zhang, Cheng jin, Yuantao Gu

Abstract:Token-based time series large language models (TS-LLMs) have emerged as a promising direction for time series analysis and reasoning. However, prior studies largely overlook the inherent continuity and ordinality of time series tokens, which substantially limits model performance. In this paper, we argue that preserving these properties in time series token embeddings is crucial for the effectiveness of token-based TS-LLMs. To this end, we propose COM (Continuity and Ordinality Matter), a continuity- and ordinality-aware strategy that integrates geometric constraints into both the initialization and training stages. Empirical results on multiple time series analysis benchmarks demonstrate that COM consistently improves the performance of token-based TS-LLMs, achieving competitive results and strong generalizability. Code is available at https://anonymous.4open.science/r/COM .

Via

Access Paper or Ask Questions

Stage-wise Dynamics of Classifier-Free Guidance in Diffusion Models

Sep 26, 2025

Cheng Jin, Qitan Shi, Yuantao Gu

Abstract:Classifier-Free Guidance (CFG) is widely used to improve conditional fidelity in diffusion models, but its impact on sampling dynamics remains poorly understood. Prior studies, often restricted to unimodal conditional distributions or simplified cases, provide only a partial picture. We analyze CFG under multimodal conditionals and show that the sampling process unfolds in three successive stages. In the Direction Shift stage, guidance accelerates movement toward the weighted mean, introducing initialization bias and norm growth. In the Mode Separation stage, local dynamics remain largely neutral, but the inherited bias suppresses weaker modes, reducing global diversity. In the Concentration stage, guidance amplifies within-mode contraction, diminishing fine-grained variability. This unified view explains a widely observed phenomenon: stronger guidance improves semantic alignment but inevitably reduces diversity. Experiments support these predictions, showing that early strong guidance erodes global diversity, while late strong guidance suppresses fine-grained variation. Moreover, our theory naturally suggests a time-varying guidance schedule, and empirical results confirm that it consistently improves both quality and diversity.

* 24 pages, 10 figures

Via

Access Paper or Ask Questions

ReTrack: Data Unlearning in Diffusion Models through Redirecting the Denoising Trajectory

Sep 16, 2025

Qitan Shi, Cheng Jin, Jiawei Zhang, Yuantao Gu

Abstract:Diffusion models excel at generating high-quality, diverse images but suffer from training data memorization, raising critical privacy and safety concerns. Data unlearning has emerged to mitigate this issue by removing the influence of specific data without retraining from scratch. We propose ReTrack, a fast and effective data unlearning method for diffusion models. ReTrack employs importance sampling to construct a more efficient fine-tuning loss, which we approximate by retaining only dominant terms. This yields an interpretable objective that redirects denoising trajectories toward the $k$-nearest neighbors, enabling efficient unlearning while preserving generative quality. Experiments on MNIST T-Shirt, CelebA-HQ, CIFAR-10, and Stable Diffusion show that ReTrack achieves state-of-the-art performance, striking the best trade-off between unlearning strength and generation quality preservation.

Via

Access Paper or Ask Questions

M$^2$CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation

Mar 25, 2025

Ziyuan Liu, Jiawei Zhang, Wenyu Wang, Yuantao Gu

Abstract:Most existing change detection (CD) methods focus on optical images captured at different times, and deep learning (DL) has achieved remarkable success in this domain. However, in extreme scenarios such as disaster response, synthetic aperture radar (SAR), with its active imaging capability, is more suitable for providing post-event data. This introduces new challenges for CD methods, as existing weight-sharing Siamese networks struggle to effectively learn the cross-modal data distribution between optical and SAR images. To address this challenge, we propose a unified MultiModal CD framework, M$^2$CD. We integrate Mixture of Experts (MoE) modules into the backbone to explicitly handle diverse modalities, thereby enhancing the model's ability to learn multimodal data distributions. Additionally, we innovatively propose an Optical-to-SAR guided path (O2SP) and implement self-distillation during training to reduce the feature space discrepancy between different modalities, further alleviating the model's learning burden. We design multiple variants of M$^2$CD based on both CNN and Transformer backbones. Extensive experiments validate the effectiveness of the proposed framework, with the MiT-b1 version of M$^2$CD outperforming all state-of-the-art (SOTA) methods in optical-SAR CD tasks.

* 5 pages, 2 figures

Via

Access Paper or Ask Questions

Improving Diffusion-based Inverse Algorithms under Few-Step Constraint via Learnable Linear Extrapolation

Mar 13, 2025

Jiawei Zhang, Ziyuan Liu, Leon Yan, Gen Li, Yuantao Gu

Figure 1 for Improving Diffusion-based Inverse Algorithms under Few-Step Constraint via Learnable Linear Extrapolation

Figure 2 for Improving Diffusion-based Inverse Algorithms under Few-Step Constraint via Learnable Linear Extrapolation

Figure 3 for Improving Diffusion-based Inverse Algorithms under Few-Step Constraint via Learnable Linear Extrapolation

Figure 4 for Improving Diffusion-based Inverse Algorithms under Few-Step Constraint via Learnable Linear Extrapolation

Abstract:Diffusion models have demonstrated remarkable performance in modeling complex data priors, catalyzing their widespread adoption in solving various inverse problems. However, the inherently iterative nature of diffusion-based inverse algorithms often requires hundreds to thousands of steps, with performance degradation occurring under fewer steps which limits their practical applicability. While high-order diffusion ODE solvers have been extensively explored for efficient diffusion sampling without observations, their application to inverse problems remains underexplored due to the diverse forms of inverse algorithms and their need for repeated trajectory correction based on observations. To address this gap, we first introduce a canonical form that decomposes existing diffusion-based inverse algorithms into three modules to unify their analysis. Inspired by the linear subspace search strategy in the design of high-order diffusion ODE solvers, we propose the Learnable Linear Extrapolation (LLE) method, a lightweight approach that universally enhances the performance of any diffusion-based inverse algorithm that fits the proposed canonical form. Extensive experiments demonstrate consistent improvements of the proposed LLE method across multiple algorithms and tasks, indicating its potential for more efficient solutions and boosted performance of diffusion-based inverse algorithms with limited steps. Codes for reproducing our experiments are available at \href{https://github.com/weigerzan/LLE_inverse_problem}{https://github.com/weigerzan/LLE\_inverse\_problem}.

* preprint

Via

Access Paper or Ask Questions

JL1-CD: A New Benchmark for Remote Sensing Change Detection and a Robust Multi-Teacher Knowledge Distillation Framework

Feb 19, 2025

Ziyuan Liu, Ruifei Zhu, Long Gao, Yuanxiu Zhou, Jingyu Ma, Yuantao Gu

Figure 1 for JL1-CD: A New Benchmark for Remote Sensing Change Detection and a Robust Multi-Teacher Knowledge Distillation Framework

Figure 2 for JL1-CD: A New Benchmark for Remote Sensing Change Detection and a Robust Multi-Teacher Knowledge Distillation Framework

Figure 3 for JL1-CD: A New Benchmark for Remote Sensing Change Detection and a Robust Multi-Teacher Knowledge Distillation Framework

Figure 4 for JL1-CD: A New Benchmark for Remote Sensing Change Detection and a Robust Multi-Teacher Knowledge Distillation Framework

Abstract:Deep learning has achieved significant success in the field of remote sensing image change detection (CD), yet two major challenges remain: the scarcity of sub-meter, all-inclusive open-source CD datasets, and the difficulty of achieving consistent and satisfactory detection results across images with varying change areas. To address these issues, we introduce the JL1-CD dataset, which contains 5,000 pairs of 512 x 512 pixel images with a resolution of 0.5 to 0.75 meters. Additionally, we propose a multi-teacher knowledge distillation (MTKD) framework for CD. Experimental results on the JL1-CD and SYSU-CD datasets demonstrate that the MTKD framework significantly improves the performance of CD models with various network architectures and parameter sizes, achieving new state-of-the-art results. The code is available at https://github.com/circleLZY/MTKD-CD.

* 14 pages, 9 figures. Submitted to IEEE Transactions on Geoscience and Remote Sensing (TGRS)

Via

Access Paper or Ask Questions

See it, Think it, Sorted: Large Multimodal Models are Few-shot Time Series Anomaly Analyzers

Nov 04, 2024

Jiaxin Zhuang, Leon Yan, Zhenwei Zhang, Ruiqi Wang, Jiawei Zhang, Yuantao Gu

Figure 1 for See it, Think it, Sorted: Large Multimodal Models are Few-shot Time Series Anomaly Analyzers

Figure 2 for See it, Think it, Sorted: Large Multimodal Models are Few-shot Time Series Anomaly Analyzers

Figure 3 for See it, Think it, Sorted: Large Multimodal Models are Few-shot Time Series Anomaly Analyzers

Figure 4 for See it, Think it, Sorted: Large Multimodal Models are Few-shot Time Series Anomaly Analyzers

Abstract:Time series anomaly detection (TSAD) is becoming increasingly vital due to the rapid growth of time series data across various sectors. Anomalies in web service data, for example, can signal critical incidents such as system failures or server malfunctions, necessitating timely detection and response. However, most existing TSAD methodologies rely heavily on manual feature engineering or require extensive labeled training data, while also offering limited interpretability. To address these challenges, we introduce a pioneering framework called the Time Series Anomaly Multimodal Analyzer (TAMA), which leverages the power of Large Multimodal Models (LMMs) to enhance both the detection and interpretation of anomalies in time series data. By converting time series into visual formats that LMMs can efficiently process, TAMA leverages few-shot in-context learning capabilities to reduce dependence on extensive labeled datasets. Our methodology is validated through rigorous experimentation on multiple real-world datasets, where TAMA consistently outperforms state-of-the-art methods in TSAD tasks. Additionally, TAMA provides rich, natural language-based semantic analysis, offering deeper insights into the nature of detected anomalies. Furthermore, we contribute one of the first open-source datasets that includes anomaly detection labels, anomaly type labels, and contextual description, facilitating broader exploration and advancement within this critical field. Ultimately, TAMA not only excels in anomaly detection but also provides a comprehensive approach for understanding the underlying causes of anomalies, pushing TSAD forward through innovative methodologies and insights.

* Under review

Via

Access Paper or Ask Questions

Unleashing the Denoising Capability of Diffusion Prior for Solving Inverse Problems

Jun 11, 2024

Jiawei Zhang, Jiaxin Zhuang, Cheng Jin, Gen Li, Yuantao Gu

Abstract:The recent emergence of diffusion models has significantly advanced the precision of learnable priors, presenting innovative avenues for addressing inverse problems. Since inverse problems inherently entail maximum a posteriori estimation, previous works have endeavored to integrate diffusion priors into the optimization frameworks. However, prevailing optimization-based inverse algorithms primarily exploit the prior information within the diffusion models while neglecting their denoising capability. To bridge this gap, this work leverages the diffusion process to reframe noisy inverse problems as a two-variable constrained optimization task by introducing an auxiliary optimization variable. By employing gradient truncation, the projection gradient descent method is efficiently utilized to solve the corresponding optimization problem. The proposed algorithm, termed ProjDiff, effectively harnesses the prior information and the denoising capability of a pre-trained diffusion model within the optimization framework. Extensive experiments on the image restoration tasks and source separation and partial generation tasks demonstrate that ProjDiff exhibits superior performance across various linear and nonlinear inverse problems, highlighting its potential for practical applications. Code is available at https://github.com/weigerzan/ProjDiff/.

Via

Access Paper or Ask Questions

EPA: Neural Collapse Inspired Robust Out-of-Distribution Detector

Jan 03, 2024

Jiawei Zhang, Yufan Chen, Cheng Jin, Lei Zhu, Yuantao Gu

Abstract:Out-of-distribution (OOD) detection plays a crucial role in ensuring the security of neural networks. Existing works have leveraged the fact that In-distribution (ID) samples form a subspace in the feature space, achieving state-of-the-art (SOTA) performance. However, the comprehensive characteristics of the ID subspace still leave under-explored. Recently, the discovery of Neural Collapse ($\mathcal{NC}$) sheds light on novel properties of the ID subspace. Leveraging insight from $\mathcal{NC}$, we observe that the Principal Angle between the features and the ID feature subspace forms a superior representation for measuring the likelihood of OOD. Building upon this observation, we propose a novel $\mathcal{NC}$-inspired OOD scoring function, named Entropy-enhanced Principal Angle (EPA), which integrates both the global characteristic of the ID subspace and its inner property. We experimentally compare EPA with various SOTA approaches, validating its superior performance and robustness across different network architectures and OOD datasets.

* Accepted by ICASSP 2024

Via

Access Paper or Ask Questions